21 research outputs found

    Design and performance evaluation of a web-based multi-tier federated system for a catalogue of life

    No full text
    In the SPICE (SPecies 2000 Interoperability Co-ordination Environment) project, we are designing and evaluating a web-based multi-tier federated system, intended as a scalable infrastructure for a globally distributed federated database of biological knowledge. It is designed to harness specialist expertise on classification of individual groups of organisms to form a 'catalogue of life' of high taxonomic quality, and to solve problems of heterogeneity, scale and component database unreliability while providing a reconfigurable, flexible, cost-effective Internet online gateway to the distributed catalogue. This paper outlines our design and design rationale, explaining how our multi-tier federated approach makes maintenance of a consistent classification easier. We present the conceptual architecture and several intelligent agents prototyped to improve system performance with respect to scalability, stability and adaptability, and discuss the evaluation of this system. Our most important finding is that a knowledge based search using suitable caching techniques will effectively reduce the response time unless the core system is excessively loade

    Assisting the integration of taxonomic data: The LITCHI toolkit

    No full text
    The prototype toolkit, called LITCHI, uses constraints and constraint violation repair techniques to enable the automated detection and, where possible, the automated resolution of conflicts in taxonomic databases. The LITCHI software has been used successfully on various test sets. LITCHI has led to the discovery that individual checklists are less consistent than had been anticipated, although its initial purpose was to detect conflicts between distinct checklists.</p

    Conflict detection for integration of taxonomic data sources

    No full text
    The LITCHI project aims to assist biologists in the integration of database by searching for conflicts within taxonomix checklists. In order to detect such conflicts, a formal model of taxonomic practice has been created and used as the basis for a prototype tool that uses Prolog search for naming conflicts within a relational database of checklists. The prototype tool is already proving its worth by detecting conflicts and errors within real taxonomic checklists.</p

    Techniques for effective integration, maintenance and evolution of species databases

    No full text
    The LITCHI project is concerned with the integration and maintenance of databases of biological knowledge organised by species. We use constraints pertaining to good taxonomic practice in order to identify taxonomix conflicts in individual species databases and in databases formed by merging species databases from distinct sources. The LITCHI system can be used to resolve such conflicts incrementally. As the project has progressed, we have identified a number of distinctive features of the problem domain, and needs of the intended users, which have had a significant impact on the techniques and modes of operation that we found to be appropriate, especially in contrast with applications that handle rapidly-accumulating `raw' data. It is upon these aspects of LITCHI that we concentrate in the present paper viewing LITCHI as an example of the more general problem of merging scientific data sets in which conflicts between the terminology used can occur

    Techniques for effective integration, maintenance and evolution of species databases

    No full text
    The LITCHI project is concerned with the integration and maintenance of databases of biological knowledge organized by species. We use constraints pertaining to good taxonomic practice in order to identify taxonomic conflicts in individual species databases and in databases formed by merging species databases from distinct sources. The LITCHI system can be used to resolve such conflicts incrementally. As the project has progressed, we have identified a number of distinctive features of the problem domain, and needs of the intended users, which have had a significant impact on the techniques and modes of operation that we found to be appropriate, especially in contrast with applications that handle rapidly-accumulating `raw' data. It is upon these aspects of LITCHI that we concentrate in the present paper, viewing LITCHI as an example of the more general problem of merging scientific data sets in which conflicts between the terminology used can occur.</p

    Spice: A flexible architecture for integrating autonomous databases to comprise a distributed catalogue of life

    No full text
    In the SPICE project we are building a distributed catalogue of life, which will eventually be formed from up to 200 autonomous taxonomic databases. We are faced with a number of challenges, which include the scalability of the system; the accommodation of partial or missing data; queries which are potentially very expensive computationally, where it is difficult to determine which databases will contain data matching the queries, and the effective integration of heterogeneous databases at the knowledge level. In this paper we present the architecture on which SPICE is being built, and we explain how, within our SPICE architecture, we will be able to explore and develop new techniques to enhance access to the SPICE distributed database.</p

    LITCHI: knowledge integrity testing for taxonomic databases

    No full text
    The Logic-based Integration of Taxonomic Conflicts in Heterogeneous Information Systems (LITCHI) project is initiated with the aim of developing software to enable the automated detection and, where possible, resolution of conflicts in taxonomic checklists. To support this project, a formal model is constructed of the way scientific names are used to denote taxa in common taxonomic practice. The model is then used to derive sets of Prolog rules which will detect conflicts in taxonomic checklists stored in a relational DBMS.</p

    Conflict detection for integration of taxonomic data sources

    No full text
    Over recent years, international initiatives such as the 1993 UN Convention on Biological Diversity have highlighted the need for information about species diversity on a global scale. However, attempts to build global information systems by integrating smaller, independently created biodiversity databases have been hampered by differences in the sets of species names used. Some databases use different names to refer to the same species, while in other cases the same name can be applied to differing definitions of a species, or even entirely different species. The LITCHI project aims to assist biologists in the integration of databases by searching for conflicts within taxonomic checklists (i.e. lists of the species names used in a database and the relationships between them). In order to detect such conflicts, we have created a formal model of taxonomic practice, which describes (amongst other things) what it means for a checklist to be consistent and well-specified. This model has been used as the basis for a prototype tool that uses Prolog to search for naming conflicts within a relational database of checklists. We describe the background to our formal model and show how it has been used to implement the LITCHI system. Our prototype tool is already proving its worth by detecting conflicts and errors within real taxonomic checklist

    Experiences with a hybrid implementation of a globally distributed federated database system

    No full text
    The SPICE project is developing a globally distributed federated database of biological knowledge, forming a ‘catalogue of life’ by harnessing specialist expertise on classification of groups of organisms. The component databases are heterogeneous, and are joined to the federation in various ways. We explain how our federated approach partitions the task of maintaining a consistent classification into manageable sub-tasks. We use both CORBA and XML and, while CORBA is widely used for interoperable systems and XML is attractive for data exchange, some problems have arisen in practice. We discuss the problems encountered when incorporating CORBA ORBs from multiple vendors, compromising true platform independence. We also discuss the nontrivial effort required to achieve stability in CORBA-based systems, despite the benefits offered by CORBA in this respect. We present preliminary results, illustrating how performance is affected by various implementational choices.</p

    Strategies for the sustainability of online open-access biodiversity databases

    No full text
    Highlights: • Open-access online scholarly biodiversity databases are threatened by a lack of funding and institutional support. • Strategic approaches to aid sustainability are summarised. • Issues include database coverage, quality, uniqueness; clarity of Intellectual Property Rights, ownership and governance. • Long-term support from institutions and scientists is easier for high-quality, comprehensive, prestigious global databases. • Larger multi-partner governed databases are more sustainable; i.e. ‘bigger (multi-partner) databases are better’. Abstract: Scientists should ensure that high quality research information is readily available on the Internet so society is not dependant on less authoritative sources. Many scientific projects and initiatives publish information on species and biodiversity on the World Wide Web without users needing to pay for it. However, these resources often stagnate when project funding expired. Based on a large pool of experiences worldwide, this article discusses what measures will help such data resources develop beyond the project lifetime. Biodiversity data, just as data in many other disciplines, are often not generated automatically by machines or sensors. Data on for example species are based on human observations and interpretation. This requires continuous data curation to keep these up to date. Creators of online biodiversity databases should consider whether they have the resources to make their database of such value that other scientists and/or institutions would continue to finance its existence. To that end, it may be prudent to engage such partners in the development of the resource from an early stage. Managers of existing biodiversity databases should reflect on the factors being important for sustainability. These include the extent, scope, quality and uniqueness of database content; track record of development; support from scientists; support from institutions, and clarity of Intellectual Property Rights. Science funders should give special attention to the development of scholarly databases with expert-validated content. The science community has to appreciate the efforts of scientists in contributing to open-access databases, including by citing these resources in the Reference lists of publications that use them. Science culture must thus adapt its practices to support online databases as scholarly publications. To sustain such databases, we recommend they should (a) become integrated into larger collaborative databases or information systems with a consequently larger user community and pool of funding opportunities, and (b) be owned and curated by a science organisation, society, or institution with a suitable mandate. Good governance and proactive communication with contributors is important to maintain the team enthusiasm that launched the resource. Experience shows that ‘bigger is better’ in terms of database size because the resource will have more content, more potential and known uses and users of its content, more contributors, be more prestigious to contribute to, and have more funding options. Furthermore, most successful biodiversity databases are managed by a partnership of individuals and organisations
    corecore